Thousands of Voices for HMM-based Speech Synthesis
نویسندگان
چکیده
Our recent experiments with HMM-based speech synthesis systems have demonstrated that speaker-adaptive HMM-based speech synthesis (which uses an ‘average voice model’ plus model adaptation) is robust to non-ideal speech data that are recorded under various conditions and with varying microphones, that are not perfectly clean, and/or that lack of phonetic balance. This enables us consider building high-quality voices on ’non-TTS’ corpora such as ASR corpora. Since ASR corpora generally include a large number of speakers, this leads to the possibility of producing an enormous number of voices automatically. In this paper we show thousands of voices for HMM-based speech synthesis that we have made from several popular ASR corpora such as the Wall Street Journal databases (WSJ0/WSJ1/WSJCAM0), Resource Management, Globalphone and Speecon. We report some perceptual evaluation results and outline the outstanding issues.
منابع مشابه
An investigation of the impact of speech transcript errors on HMM voices
Toward automatic creation of web-based voice fonts at low cost, automatic speech transcription technology is used to obtain the linguistic features for building HMM-based voices from audio web contents. This paper presents an investigation of the influences of erroneous transcripts on such voices. We simulate varied speech transcript errors by using a large vocabulary automatic speech recognize...
متن کاملSynthesis and evaluation of conversational characteristics in speech synthesis
Conventional synthetic voices can synthesise neutral read aloud speech well. But, to make synthetic speech more suitable for a wider range of applications, the voices need to express more than just the word identity. We need to develop voices that can partake in a conversation and express, e.g. agreement, disagreement, hesitation, in a natural and believable manner. In speech synthesis there ar...
متن کاملDetails of the Nitech HMM-Based Speech Synthesis System for the Blizzard Challenge 2005
In January 2005, an open evaluation of corpus-based textto-speech synthesis systems using common speech datasets, named Blizzard Challenge 2005, was conducted. Nitech group participated to this challenge with a newly designed HMM-based speech synthesis system (Nitech-HTS 2005). In the present paper, technical details, building processes, and the performance of the Nitech-HTS 2005 voices are des...
متن کاملA novel irregular voice model for HMM-based speech synthesis
State-of-the-art text-to-speech (TTS) synthesis is often based on statistical parametric methods. Particular attention is paid to hidden Markov model (HMM) based text-to-speech synthesis. HMM-TTS is optimized for ideal voices and may not produce high quality synthesized speech with voices having frequent non-ideal phonation. Such a voice quality is irregular phonation (also called as glottaliza...
متن کاملSpeech Synthesis Based on Hidden Markov Models and Deep Learning
Speech synthesis based on Hidden Markov Models (HMM) and other statistical parametric techniques have been a hot topic for some time. Using this techniques, speech synthesizers are able to produce intelligible and flexible voices. Despite progress, the quality of the voices produced using statistical parametric synthesis has not yet reached the level of the current predominant unit-selection ap...
متن کامل